Scaling Distributed File Systems in Resource-Harvesting Datacenters
نویسندگان
چکیده
Datacenters can use distributed file systems to store data for batch processing on the same servers that run latencycritical services. Taking advantage of this storage capacity involves minimizing interference with the co-located services, while implementing user-friendly, efficient, and scalable file system access. Unfortunately, current systems fail one or more of these requirements, and must be manually partitioned across independent subclusters. Thus, in this paper, we introduce techniques for automatically and transparently scaling such file systems to entire resource-harvesting datacenters. We create a layer of software in front of the existing metadata managers, assign servers to subclusters to minimize interference and data movement, and smartly migrate data across subclusters in the background. We implement our techniques in HDFS, and evaluate them using simulation of 10 production datacenters and a real 4k-server deployment. Our results show that our techniques produce high file access performance, and high data durability and availability, while migrating a limited amount of data. We recently deployed our system onto 30k servers in Bing’s datacenters, and discuss lessons from this deployment.
منابع مشابه
CPU Frequency Scaling Algorithm for Energy-saving in Cloud Data Centers
High energy consumption becomes an urgent problem in cloud datacenters. Based on virtualization technologies, the pay-as-you-go resource provision paradigm has become a trend. Specifically, Virtual Machine (VM) is the basic resource unit in data center for resource migration and provisioning. Many researches have been devoted to improve datacenter resource utilization and reduce power consumpti...
متن کاملEfficient Workload and Resource Management in Datacenters by Hong Xu
E cient Workload and Resource Management in Datacenters Hong Xu Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 2013 This dissertation focuses on developing algorithms and systems to improve the e ciency of operating mega datacenters with hundreds of thousands of servers. In particular, it seeks to address two challenges: First, how to distr...
متن کاملEnergy-efficient Data-intensive Computing with a Fast Array of Wimpy Nodes
Large-scale data-intensive computing systems have become a critical foundation for Internet-scale services. eir widespread growth during the past decade has raised datacenter energy demand and created an increasingly large nancial burden and scaling challenge: Peak energy requirements today are a signi cant cost of provisioning and operating datacenters. In this thesis, we propose to reduce th...
متن کاملA Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems
Cloud computing is a result of the continuing progress made in the areas of hardware, technologies related to the Internet, distributed computing and automated management. The Increasing demand has led to an increase in services resulting in the establishment of large-scale computing and data centers, in addition to high operating costs and huge amounts of electrical power consumption. Insuffic...
متن کاملDistributed VNF Scaling in Large-scale Datacenters: An ADMM-based Approach
Network Functions Virtualization (NFV) is a promising network architecture where network functions are virtualized and decoupled from proprietary hardware. In modern datacenters, user network traffic requires a set of Virtual Network Functions (VNFs) as a service chain to process traffic demands. Traffic fluctuations in Large-scale DataCenters (LDCs) could result in overload and underload pheno...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017